Phylogenetic Profiling: How Much Input Data Is Enough?

نویسندگان

  • Nives Škunca
  • Christophe Dessimoz
چکیده

Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼ 100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence Analysis and Phylogenetic Profiling of the Nonstructural (NS) Genes of H9N2 Influenza A Viruses Isolated in Iran during 1998-2007

The earliest evidences on circulation of Avian Influenza (AI) virus on the Iranian poultry farms date back to 1998. Great economic losses through dramatic drop in egg production and high mortality rates are characteristically attributed to H9N2 AI virus. In the present work non-structural (NS) genes of 10 Iranian H9N2 chicken AI viruses collected during 1998-2007 were fully sequenced and subjec...

متن کامل

Protein profiling for phylogenetic relationship in snakehead species

Protein banding pattern of eight snakeheads – Channa species viz., Channa striatus, Channa marulius, Channa punctatus, Channa diplogramme, Channa bleheri, Channa gachua, Channa stewartii and Channa aurantimaculata collected from different regions of India were used to study the phylogenetic relationship among them. The banding pattern from muscle protein indicated a unique profile for each spec...

متن کامل

Protein profiling for phylogenetic relationship in snakehead species

Protein banding pattern of eight snakeheads – Channa species viz., Channa striatus, Channa marulius, Channa punctatus, Channa diplogramme, Channa bleheri, Channa gachua, Channa stewartii and Channa aurantimaculata collected from different regions of India were used to study the phylogenetic relationship among them. The banding pattern from muscle protein indicated a unique profile for each spec...

متن کامل

Treephyler: fast taxonomic profiling of metagenomes

SUMMARY Assessment of phylogenetic diversity is a key element to the analysis of microbial communities. Tools are needed to handle next-generation sequencing data and to cope with the computational complexity of large-scale studies. Here, we present Treephyler, a tool for fast taxonomic profiling of metagenomes. Treephyler was evaluated on real metagenome to assess its performance in comparison...

متن کامل

Estimating the overlap between dependent computations for automatic parallelization

Researchers working on the automatic parallelization of programs have long known that too much parallelism can be even worse for performance than too little, because spawning a task to be run on another CPU incurs overheads. Autoparallelizing compilers have therefore long tried to use granularity analysis to ensure that they only spawn off computations whose cost will probably exceed the spawn-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2015